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Abstract 

We establish the restricted isometry property for finite dimensional Gabor systems, that is, for 
families of time-frequency shifts of a randomly chosen window function. We show that the s-th 
order restricted isometry constant of the associated nxn 2 Gabor synthesis matrix is small provided 
s < cn 2/3 / log 2 n. This improves on previous estimates that exhibit quadratic scaling of n in s. 
Our proof develops bounds for a corresponding chaos process. 
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1 Introduction and statements of results 

Sparsity has become a key concept in applied mathematics and engineering. This is largely due to 
the empirical observation that a large number of real- world signals can be represented well by a sparse 
expansion in an appropriately chosen system of basic signals. Compressive sensing [§1 fTTl [T5I UM [2T1 144] 
predicts that a small number of linear samples suffices to capture all the information in a sparse vector 
and that, furthermore, we can recover the sparse vector from these samples using efficient algorithms. 
This discovery has a number of potential applications in signal processing, as well as other areas of 
science and technology. 

Linear data acquisition is described by a measurement matrix. The restricted isometry property 
(RIP) [T21 Q21 [2U [33] is by- now a standard tool for studying how efficiently the measurement matrix 
captures information about sparse signals. The RIP also streamlines the analysis of signal reconstruc- 
tion algorithms, including ^i-minization, greedy and iterative algorithms. Up to date there are no 
deterministic constructions of measurement matrices available that satisfy the RIP with the optimal 
scaling behavior; see, for example, the discussions in [331 Sec. 2.5] and [2D Sec. 5.1]. In contrast, a vari- 
ety of random measurement matrices exhibit the RIP with optimal scaling, including Gaussian matrices 
and Radcmachcr matrices [SI 1201 133 US] • 

Although Gaussian random matrices are optimal for sparse recovery [211 [25], they have limited use in 
practice because many applications impose structure on the matrix. Furthermore, recovery algorithms 
are significantly more efficient when the matrix admits a fast matrix-vector multiplication. For example, 
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random sets of rows from a discrete Fourier transform matrix model the measurement process in MRI 
imaging and other applications. These random partial Fourier matrices lead to fast recovery algorithms 
because they can utilize the FFT. It is known that a random partial Fourier matrix satisfies a near- 
optimal RIP [TSJ [49] 32l [44] with high probability; see also [44] [48] for some generalizations. 

This paper studies another type of structured random matrix that arises from time- frequency analy- 
sis, and has potential applications for the channel identification problem |41[ in wireless communications 
and sonar [35] [50] , as well as in radar [30] . The columns of the considered n x n 2 matrix consist of 
all discrete time-frequency shifts of a random vector. Previous analysis of this matrix has provided 
bounds for the coherence [41], as well as nonuniform sparse recovery guarantees using £i-minimization 
[45] . However, the so far best available bounds on the restricted isomctry constants were derived from 
coherence bounds [41] and, therefore, exhibit highly non-optimal quadratic scaling of n in the sparsity 
s. This paper dramatically improves on these bounds. Such an improvement is important because the 
nonuniform recovery guarantees in |45j apply only for ^-minimization, they do not provide stability of 
reconstruction, and they do not show the existence of a single time-frequency structured measurement 
matrix that is able to recover all sufficiently sparse vectors. Also it is of theoretical interest whether 
Gabor systems, that is, the columns of our measurement matrix, can possess the restricted isometry 
property. Nevertheless, our results still fall short of the optimal scaling that one might hope for. 

Our approach is similar to the recent restricted isometry analysis for partial random circulant matri- 
ces in |46j . Indeed, also here we bound a chaos process of order 2, by means of a Dudley type inequality 
for such processes due to Talagrand |53| . This requires to estimate covering numbers of the set of unit 
norm s-sparse vectors with respect to two different metrics induced by the process. In contrast to |4"6] . 
the specific structure of our problem does not allow us to reduce to the Fourier case, and to apply 
covering number estimates shown in [49] . 

This paper is organized as follows. In Section [TTT1 we recall central concepts in compressive sensing. 
Section 11.21 introduces the time-frequency structured measurement matrices that are considered in this 
paper, and we state our main result, Theorem [T] Remarks on applications in wireless communications 
and radar, as well as the relation of this paper to previous work are given in Sections 11.41 and 11.31 
respectively. Sections [2] [3] and 0] provide the proof of Theorem [1] 

1.1 Compressive Sensing 

In general, reconstructing x = (xi, . . . , xn) t G C n from 

y = ixeC", (1) 

where A G C nxN and n <C N (in this paper, we have N — n 2 ) is impossible without substantial a-priori 
information on x. In compressive sensing the assumption that x is s-sparse, that is, ||x||o := : 
xe 7^ 0} < s for some s <C N is introduced to ensure uniqueness and efficient recoverability of x. More 
generally, under the assumption that x is well-approximated by a sparse vector, the question is posed 
whether an optimally sparse approximation to x can be found efficiently. 

Reconstruction of a sparse vector x by means of the ^-minimization problem, 

min||z||o subject to y = Az, 

z 

is NP-hard |36j and therefore not tractable. Consequently, a number of alternatives to ^-minimization, 
for example, greedy algorithms [5] [23] [37] [54] [55], have been proposed in the literature. The most 
popular approach utilizes ^i-minimization |111 1151 [T9] . that is, the convex program 

min||z||i subject to y = Az, (2) 

z 

is solved, where ||z||i = |zi| + \%2\ + ■ • • + |zjv| denotes the usual l\ vector norm. 

To guarantee recoverability of the sparse vector x in ([1]) by means of ^-minimization and greedy 
algorithms, it suffices to establish the restricted isometry property (RIP) of the so-called measurement 
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matrix A: define the restricted isometry constant S s of an n x N matrix A to be the smallest positive 
number that satisfies 

(1-S s )\\x\\j < \\Ax\\l < (1 + S s )\\x\\l for all x with ||x|| < s. (3) 

In words, the statement ([3]) requires that all column submatrices of A with at most s columns are 
well-conditioned. Informally, A is said to satisfy the RIP with order s when S s is "small" . 
Now, if the matrix A obeys (|3|) with 

S KS < S* (4) 

for suitable constants k > 1 and 5* < 1, then many algorithms precisely recover any s-sparse vectors x 
from the measurements y = Ax. Moreover, if x can be well approximated by an s sparse vector, then 
for noisy observations 

y = Ax + e where ||e||2 < r, 
these algorithms return a reconstruction x that satisfies an error bound of the form 

\\x-x\\ 2 < d^i + C 2 r, (5) 

where a s (x)i = inf|| z || < iS \\x — z\\ \ denotes the error of best s-tcrm approximation in l\ and G\, Ci are 
positive constants. For illustration, we include Table [T] which lists available values for the constants k 
and 5* in ((4]) that guarantee ([5]) for several algorithms along with respective references. 
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Table 1: Values of the constants k and 6* in Q that guarantee success for various recovery algorithms. 



For example, Gaussian random matrices, that is, matrices that have independent, normally dis- 
tributed entries with mean zero and variance one, have been shown 3, 13 , 34 to have restricted isometry 
constants of -^^4 satisfy 5 S < S with high probability provided that 

n > CS- 2 slog(N/s). 

That is, the number n of Gaussian measurements required to reconstruct an s-sparse signal of length 
N is linear in the sparsity and logarithmic in the ambient dimension. See [3l 1131 1341 1211 144] for precise 
statements and extensions to Bernoulli and subgaussian matrices. It follows from lower estimates of 
Gclfand widths that this bound on the required samples is optimal (TTl [25l [26] , that is, the log-factor 
must be present. 

As discussed above, no deterministic construction of a measurement matrix is known which provides 
RIP with optimal scaling of the recoverable sparsity s in the number of measurements n. In fact, all 
available proofs of the RIP with close to optimal scaling require the measurement matrix to contain some 
randomness. In Tablc[2]wc list the Shannon entropy (in bits) of various random matrices along with the 
available RIP estimates. Compared to Gaussian random matrices, the Gabor synthesis measurement 
matrices constructed in this paper introduces only a small amount of randomness, that is, the presented 
measurement matrix depends only on the so-called Gabor window, a random vector of length n, which 
can be chosen to be a normalized copy of a Radcmacher vector. Moreover, the random Gabor matrix 
provably provides scaling of s roughly in n 2 / 3 , which significantly improves on known deterministic 
constructions. Clearly, such scaling falls short of the optimal one, but we expect that it is possible 
to establish linear scaling of s in n up to log-factors, similar to Gaussian matrices or partial random 
Fourier matrices. However, such improvement seems to require more powerful methods to estimate 
chaos processes than presently available. 
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n x N Measurement matrix 


Shannon entropy 
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References 
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n 


s < Cn 2 / 3 / log 2 n 


this paper 
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Table 2: List of measurement matrices that have been proven to be RIP, scaling of sparsity s in the 
number of measurements n, and the respective Shannon entropy of the (random) matrix. 



1.2 Time-frequency structured measurement matrices 

In this paper, we provide probabilistic estimates of the restricted isometry constants for matrices whose 
columns are time-frequency shifts of a randomly chosen vector. To define these matrices, we let T 
denote the cyclic shift, also called translation operator, and M the modulation operator, or frequency 
shift operator, on C™. They are defined by 

(Th) q = h qel and (Mh) q = e 2 ^' n h q = u q h q , (6) 

where is subtraction modulo n and uj = e 27ri /™. Note that 

(T k h) q = h qek and (M e h) q = e 2nU ^ n h q = ^h q . (7) 

The operators 7r(A) = M £ T k , X = (k,£), are called time-frequency shifts and the system {tt(A) : A £ 
Z„xZ n }, 1 n = {0, l,...,n — 1}, of all time-frequency shifts forms a basis of the matrix space C™ x " 

021 EU- 

We choose e £ C" to be a Rademacher or Steinhaus sequence, that is, a vector of independent random 
variables taking the values +1 and —1 with equal probability, respectively taking values uniformly 
distributed on the complex torus S 1 = {z £ C, \z\ = 1}. The normalized window is 

9 = n-V\ 

and the set 

{7r(A)g : A £ Z„xZ n } (8) 

is called a full Gabor system with window g [28]. The matrix \& g £ C nx ™ whose columns list the 
members ir(X)g, X £ Z„xZ„, of the Gabor system is referred to as Gabor synthesis matrix [T51 [521 HP] . 
Note that \& g allows for fast matrix vector multiplication algorithms based on the FFT. The main result 
of this paper addresses the restricted isometry constants of \I/ g . Below E denotes expectation and P 
the probability of an event. 

theorem 1 Let £ C nx ™ be a draw of the random Gabor synthesis matrix with normalized Steinhaus 
or Rademacher generating vector. 

(a) The expectation of the restricted isometry constant 5 S of *& g , s < n, satisfies 

( [73/2 s 3/2 l „.3/2 

E5 S < max {dW — logs Vlog^, C 2 -2 }, (9) 

where C\ , C% > are universal constants. 



4 



(b) For < A < 1, we have 

m > E[6 S ] + A) < e"» 2 , where a 2 = C ^ l ^^ s (10) 

n 

with C3 > being a universal constant. 

With slight variations of the proof one can show similar statements for normalized Gaussian or 
subgaussian random windows g. 

Roughly speaking ty g satisfies the RIP of order s with high probability if n > Cs 3 / 2 log 3 (n), or 
cquivalently if, 

s < cn 2/3 / log 2 n. 

We expect that this is not the optimal estimate, but improving on this seems to require more sophisti- 
cated techniques than pursued in this paper. There are known examples [331 153j for which the central 
tool in this paper, the Dudley type inequality for chaos processes stated in Theorem [3j is not sharp. 
We may well be facing one of these cases here. 

Numerical tests illustrating the use of *S? g for compressive sensing are presented in [IT]. They 
illustrate that empirically ^ g performs very similarly to a Gaussian matrix. 

1.3 Application in wireless communications and radar 

An important task in wireless communications is to identify the communication channel at hand, that 
is, the channel opperator, by probing it with a small number of known transmit signals; ideally a single 
probing signal. A common finite-dimensional model for the channel operator, that combines digital 
(discrete) to analog conversion, the analog channel, and analog to digital conversion. It is given by 

menu mills] 

r = X ^W- 

Time-shifts model delay due to multipath-propagation, while frequency-shifts model the Doppler effect 
due to moving transmitter, receiver, and/or scatterers. Physical considerations often suggest that x is 
rather sparse as, indeed, the number of present scatterers can be assumed to be small in most cases. 
The same model is used as well in sonar [351 US] and radar [3D] ■ 

Our task is to identify from a single input output pair (g, Tg) the coefficient vector x. In other 
words, we need to reconstruct T £ C nxn , or cquivalently x, from its action y — Tg on a single vector 
g. Writing 

y = Tg = J2 X M*)9 = * a x ( n ) 

with unknown but sparse X, we arrive at a compressive sensing problem. In this setup, we clearly have 
the freedom to choose g, and we may choose it as a random Radcmacher or Steinhaus sequence. Then 
the restricted isometry property of ^f g , as shown in Theorem [TJ ensures recovery of sufficiently sparse 
x, and hence, of the associated operator T. 

Recovery of the sparse x in pip can also be interpreted as finding a sparse time-frequency represen- 
tation of a given y with respect to the window g. From an application point of view though, the vectors 
considered here are not well suited to describe meaningful sparse time-frequency representations of x 
as all g that are known to guarantee RIP of arc very poorly localized both in time and in frequency. 

1.4 Relation with previous work 

Time-frequency structured matrices \& s appeared in the study of frames with (near-)optimal coherence. 
Recall that the coherence of a matrix A = (a±\ . . . |ajy) with normalized columns ||<if H2 = 1 is defined 

as 

/i := max|(a £ ,a fe )|. 
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Choosing the Alltop window (UGH] g G C™ with entries g# = n 1 / 2 e 2 ^ li3 / n for n > 5 prime yields \& g 
with coherence 

1 

I 1 = S- 



Due to the general lower bound fj, > « / ra ^ v J f \- ) for an nxN matrix [5T], this coherence is almost optimal. 
Together with the bound S s < (s — we obtain 

This requires a scaling s < c^/n to achieve sufficiently small RIP and sparse recovery, which clearly is 
worse than the main result of this paper. 

The coherence of ^f g with Stcinhaus sequence g is estimated in [41) by 



^/log(»/e) 

holding with probability at least 1 — e. As before, this does not give better than quadratic scaling of n 
in s in order to have small RIP constants 5 S . 

The following nonuniform recovery results for ^-minimization with \& g and Steinhaus sequence g 
was derived in [45] . 

theorem 2 Let x G C" be s-sparse. Choose a Steinhaus sequence g at random. Then with probability 
at least 1 — e, the vector x can be recovered from y = ^ g x via l\-minimization provided 

n, 

s < c 



log(n/e) ' 



Clearly, the (optimal) almost linear scaling of n in s of this estimate is better than the RIP estimate of 
the main Theorem [T] However, the conclusion is weaker than what can be derived using the restricted 
isometry property: recovery in Theorem [2] is nonuniform in the sense that a given s-sparse vector can 
be recovered with high probability from a random draw of the matrix ^f g . It is not stated that a single 
matrix ^ g can recover all s-sparse vectors simultaneously. Moreover, nothing is said about the stability 
of recovery, while in contrast, small RIP constants imply ([5]). Therefore, our main Theorem [T] is of 
high interest and importance, despite the better scaling in Theorem [5J Moreover, we expect that an 
improvement of the RIP estimate is possible, although it is presently not clear how this can be achieved. 

Partial random circulant matrices arc a different , but closely related measurement matrix, studied in 
[251 331 331 SHI • They model convolution with a random vector followed by subsampling on an arbitrary 
(deterministic) set. The so far best estimate of the restricted isometry constants 5 S of such an n x N 
matrix in [46] requires n > c(s log N) 3 / 2 , similarly to the main result of this paper. The corresponding 
analysis requires to bound as well a chaos process, which is also achieved by the Dudley type bound 
of Theorem [3] below. Nonuniform recovery guarantees for partial random circulant matrices similarly 
to Theorem [2] are contained in 221 [33] . The analysis of circulant matrices benefits from a simplified 
arithmetic in the Fourier domain, a tool not available to us in the case of Gabor synthesis matrices. 
Hence, the analysis presented here is more involved. 



2 Expectation of the restricted isometry constants 

We first estimate the expectation of the restricted isometry constants of the random Gabor synthesis 
matrix, that is, we shall prove Theorem HJa). To this end, we first rewrite the restricted isometry 

2 

constants S s . Let T = T s = {x G C™ , ||a;||2 = 1, \\x\\o < s}. Introduce the following semi-norm on 
Hcrmitian matrices A, 

\\A\l s = sup |ce*A:e|. 

x£T s 



G 



Then the restricted isometry constants of \& = \I/ g can be written as 

5 S = |||***-I||| S , 

where I denotes the identity matrix. Observe that the Gabor synthesis matrix ^ g takes the form 
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Our analysis in this section employs the representation 
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and so on. In short, for (jgZ„, 
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Observe that 



A n = (T q \MT q \M 2 T q \ ■ ■ ■ \M n ~ 1 T q \ 



^ 71— I 

H := 1= I+- V e^e, A*, A, 

g,g'=0 



Using below, it follows that 



g', q 



(12) 



(13) 



where, for notational simplicity, we use here and in the following W q ', q = A*,A q for q ^ q' and 
W g / j9 = for q = q' . We employ the matrix B(x) £ C nxn , x e T s , given by matrix entries 



Then we have 



where 



f3(jX)q'.q X Wqi qX . 



nES s = E sup \Y X \ = E sup \Y X - Y \ 

x£T 3 x£T s 



Y x = e*B(x)e= ^ e q x* A*, A q z 

g'^g 



(14) 
(15) 
(16) 
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and x e T s = {x e C" x ", \\x\\ 2 < 1, \\x\\ < s}. A process of the type ([IB]) is called Rademacher or 
Steinhaus chaos process of order 2. In order to bound such a process, we use the following Theorem, see 
for example, [33J Theorem 11.22] or [S3J Theorem 2.5.2], where it is stated for Gaussian processes and in 
terms of majorizing measure (generic chaining) conditions. The formulation below requires the operator 
norm ||A|| 2 ->2 = max^i^i j|^4.£c|| 2 and the Frobenius norm \\A\\ F = Tt(A*A) 1 ^ 2 = (£\ fc \ A i,k?) 1/2 i 
where Tr(A) denotes the trace of a matrix A. 

theorem 3 Let e = (ei, . . . , e n ) T be a Rademacher or Steinhaus sequence, and let 

n 

Y x := e*B(x)e = ^ e^e q B(x) q >^ 

q',q=l 

be an associated chaos process of order 2, indexed by x G T, where we additionally assume B(x) 
hermitian with zero diagonal, that is, B(x) q ^ q = and B(x)q' t q = B(x) q>q i. We define two (pseudo- 
)metrics on T , 

d 1 (x,y) = \\B(x)-B(y)\\ 2 ^ 2 , 
d 2 (x,y) = \\B(x)-B(y)\\ F . 

Let N(T,di,u) be the minimum number of balls of radius u in the metric di needed to cover T. Then 
there exists a universal constant K > such that, for an arbitrary Xo £ T, 

E sup \Y X - Y X0 \ < iCmaxj / log N (T, d 1: u) du [ ^log N(T,d 2 ,u) du, }. (17) 
xeT Wo Jo ' 

Proof: For a Rademacher sequence, the theorem is stated in (46j Proposition 2.2]. If e is a Steinhaus 
sequence and B a Hermitian matrix then 

e*Be = Rc(e*Be) = Re(e)* Re(B) Re(e) - Re(e)* lm(B) Im(e) 

+ Im(e)* Im(J5) Re(e) + Im(e)* Re(B) Im(e). 

By decoupling, see, for example, [39l Theorem 3.1.1], we have with e 1 denoting an independent copy of 
E sup | Rc(e)* lm(B(x)) Im(e)| < 8Esup | Re(e)* Im(B(sc))Im(e')| 

xeT xeT 

< 8Esup |£*Im(B(x))Im(6')| < 8Esup |£* Im(B(x))g\, 

xeT xeT 

where denote independent Rademacher sequences. The second and third inequalities follow from 
the contraction principle [33J Theorem 4.4] (and symmetry of Re(e^), Im(e^) ) first applied conditionally 
on e' and then conditionally on £ (note that | Rc(q)| < 1, | Im(ef)| < 1 for all realizations of eg). Using 
the triangle inequality we get 

Esupl^-y^J < 16Esup|r(Re(B(aj))-Re(B(a;o))^| 

xeT xeT 

+ 16Esup |£* (Tm(B(x)) - hn(B(x )))?\. (18) 

xeT 

Further note that || Im(B(as)) - lm(B(y))\\ F , \\Rc(B(x)) - Re(B(y))\\ F < \\B(x) - B(y)\\ F and 
similarly, writing B{x) — B{y) as a 2nx2n real block matrix acting on R 2n we see that also || lm(B(x)) — 
Im(B(y))\\ 2 -,2, ||Re(B(aj)) - Re(B(y))\\ 2 ^ 2 < \\B(x) - B{y)\\ 2 ^ 2 . Furthermore, the statement for 
Rademacher chaos processes holds as well for decoupled chaos processes of the form above. (Indeed, its 
proof uses decoupling in a crucial way.) Therefore, the claim for Steinhaus sequences follows. M 
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Note that B(x) defined in (fT4"|) satisfies the hypotheses of Theorem [3] by definition. The pseudo- 
metrics are given by 

d 2 (x,y) = \\B(x)-B(y)\\ F =(Y,\x*A* q ,A q x-y*A* q ,A q y\ 2 y /2 , (19) 

q'¥=q 

and 

dx{x,y) = \\B{x) - B{y)\\ 2 ^ 2 . 

The bound on the expected restricted isometry constant follows then from the following estimates on 
the covering numbers of T s with respect to d\ and d 2 . Corresponding proofs will be detailed in Section 
El We start with N(T s ,d 2 ,u). 

Lemma 4 For u > 0, it holds 

log(N(T 8 ,d 2 ,u)) < slog(e?i 2 /s) + slog(l+4 x /^u~ 1 ). 

The above estimate is useful only for small u > 0. For large u we require the following alternative 
bound. 

Lemma 5 The diameter of T s with respect to d 2 is bounded by A^fsn, and for y^n < u < A^fsn, it holds 

log(N(T s ,d 2 ,u)) < cu- 2 ns 3/2 log(ns 5/2 u- 1 ), 
where c > is universal constant. 

Covering number estimates with respect to d\ are provided in the following lemma. 

Lemma 6 The diameter of T s with respect to d\ is bounded by As, and for u > 

log(iV(T s , di, u)) < min |slog(en 2 /s) + s log(l + Asu" 1 ), 

cu~ 2 s 2 log(2n) log(n 2 /u) } , (20) 

where c > is a universal constant. 

Based on these estimates and Theorem [3] we complete the proof of Theorem [lja). By Lemmas [4] 
and [5j the subgaussian integral in (|17|) can be estimated as 



^log(N(T s ,d 2 ,u))du = / yJlog(N(T s ,d 2 ,u))du 
Jo 

*/n r\/sn 

y/log{N(T s ,d 2 ,u))du+ / s/\og{N{T s ,d 2 ,u))du 

V™ rVn i 

\J s log(en 2 / s)du + / a/s log(l + A\fsnu~ r )du 
Jo 

riy/sn i 

+ cv ns 3 / 2 / it -1 J\og(ns b l 2 u~ 1 )du 

l r - 1/2 

< y sn\og(en 2 / s) + As^fn / \/log(l + 

Jo 

+ cV s 3 / 2 n^log(n 1 / 2 s 5 / 2 ) log(y / s) 



< \/snlog(en 2 /s) + A^fsn J \og(e{\ + \/s)) + c' y s 3 / 2 n\og(n) log 2 (s 



< C*i v / s 3/2 nlog(n)log 2 (s). (21) 
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Hereby, we have used [H] Lemma 10.3], and that s < n. Due to Lemma |5] the subexponential integral 
obeys the estimate, for some n > to be chosen below, 

oo />4s* 

log(N(T s ,d 1 ,u))du= / ]og(N{T„di,v))du 
Jo 

; pis 

log(N(T s ,d u u))du + / log(JV(T„di,tt))d« 

J K 

< Kslog(en 2 /s) + s / log(l + 4su~ 1 )du + cs 2 log(2n) / u~ 2 \og{n 2 /u)du 

JO J K 

< Kslog(en 2 /s) + 4Kslog(e(l + k(4s) -1 )) + cs 2 *;" 1 log(2n) log(n 2 /K). 



Choose k = sj s log(n) to reach 

fOO 

Iog(JV(T a ,d 1 ,«))du < C 2 s 3/2 log 3/2 (n). (22) 
Combining the above integral estimates with (|15[) and Theorem [3] yields 

ES S = -E sup \Y X -Y \<- max (cJ s 3 / 2 n log(n) log 2 (s) , C 2 s 3/2 log 3/2 (n)l . 
This is the statement of Theorem [TJa) . 



(23) 



Remark 7 In analogy to the estimate of a subgaussian entropy integral arising in the analysis of 
partial random circulant matrices in [46], we expect that the exponent 3/2 in pip can be improved 
to 1. However, we doubt that for the subexponential integral (|22|) such improvement will be possible 
(indeed, the estimate of the subexponential integral in [46] also exhibits an exponent of 3/2 at the 
s-term), so that we did not pursue an improvement of (|2ip here as this would not provide a significant 
overall improvement of ([23]) . We expect that an improvement of (|23[) would require more sophisticated 
tools than the Dudley type estimate for chaos processes of Theorem [3] 

3 Proof of covering number estimates 

In this section we provide the covering number estimates of Lemma [4] [5] and [6] which are crucial to the 
proof of our main result. We first introduce additional notation. Let S(m, k) = <$o im _fc and 6(m) = <5o,m 
be the Kronecker symbol as usual. We denote by suppai = {£, xi ^ 0} the support of a vector x. Let 
A be a matrix with vector of singular values <t(A). For < q < oo, the Schattcn 5 9 -norm is defined by 

ll^lk : = (24) 
where || • \\ q is the usual vector l q norm. For an integer p, the S^p norm can be expressed as 

\\A\\ S2p = (Tr((A*Ay)) 1/(2p) . (25) 

The Soo-norm coincides with the operator norm, || • \\s^ = \\ ■ 1 1 2 s-2 - By the corresponding properties of 

£ 9 -norms wc have the inequalities 

||A|| a _> a < HAHs, < mnk(A) 1/q \\A\\ 2 ^ 2 . (26) 
Moreover, we will require an extension of the quadratic form B(x) in (|14p to a bilinear form, 

{B{ X ,z)) q ,, q = { x * A \< A i* £9^9. (27 ) 

Then B(x) = B(x,x). 
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3.1 Time— frequency analysis on C n 

Before passing to the actual covering number estimates we provide some facts and estimates related to 
time- frequency analysis on C™. Observe that the matrices A q introduced in (|12[) satisfy 



and, hence, 
Clearly, 



and, hence, 



/ (T q )* \ 
(MT q )* 
(M 2 r«)* 

^ (M n_1 T 9 )* J 



T -q M -l 

T-iM- 2 



\ T^M 1 J 



(A* q y)( k ,£) = Vk+ q w 



-£{k+q) 



(A g z, y) = (z, A*y) = *£ k ,e z (k,e)V 



k+q' 



Mk+q) 



J2k,e z {k- q j)VkU lk 



= Efe (£* Z {k-q,l)^ k )Vk 



{A q z) k =^2z {k ^ qi£) uj ik . 



In the following, T : C n t-> C" denotes the normalized Fourier transform, that is, 

n-i 



(Tv)i = n 



-1/2 



E 

9=0 



to * v a . 



Let {e\} Xe % 1 x ^ and {e q } q& j 1 denoting the Euclidean basis of C nxn respectively C™, and, let P\ 
denote the orthogonal projection onto the one dimensional space spanfe^}- The following bounds will 
be crucial for the covering number estimates below. 



Lemma 8 Let A q be as given in (|12j) . Then, for X ^'L n y,'L n , q G Z n , 



A q e x = ir{\)e q , 



n-l 



^A* q A q = nl, 



9=0 



A q P,A* q = I , 

9=0 

n— 1 n— 1 

Y,Y,\ x * A l' Aq y\ 2 ^ n \\ x \\o\\ x \\l\\y\\l 

9=0 q '=0 

Proof: For ([28]), observe that 

(A g e (fco = S(k - q - k ,£ - £ )uj ek = 6(q - (k - k ))u*° 
i 

= (ir(k ,£ )e q ) k . 



(\,k 



(28) 
(29) 

(30) 

(31) 
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To see (HHJ), choose z e C nxn and compute 



(A*, A z) -Ww, , ^uj i( - k ' +q "^uj~ i ' {k ' +q "^ 
i 

E, U-l'){k'+q') 



Hence, 



E( A *« A 



9 2 7(fc',<») 



EE^)^''^ = ^> (fc . 



9 ^ 



) • 



Finally, observe that all but one column of AgPf^^)} are 0, the nonzero column being column (£q, ko) 
and only its (ko + <?)th entry is nonzero, namely, it is uj e °( ko+q \ We have 

A q p {(i a M)} A *q = A q p {(e ,k )} p {(e M,)} A *q = A q p WoM)}( A q p {(ioM)}T ' ■> 

and hence, A q P mM)} A* = P{ ko+q } and J2 q A q p {(l ,k )} A *q = 1 
Let x € c nx " an d A = suppa:, then 



J2Y,\ x * A t' A M = EEI E x (k',i'){ A *q' A qy) k ,J 

q q' q q' (k'j')eA 

<«EE E K^^Wf 

9 9' (fcV)eA 



EE E 



q q' {k'J')eA e 

= IHI3EE E lE^V-™! 

q q' (k'j')eA e 



= n \ x\ 



E EEI(^) ( , 



(k'-(q-q'),k'+q')\ 



E I 

(k',t')£A 



1*111 E ll^flla = ||*H2 ||l/||i = n|N| || a !||I||i/||l 



by unitarity of J 7 2- 



3.2 Proof of Lemma H 

For x,y e C"', 



d 2 (x,y) < \x*A*,A q {x-y) 



q'+q 



2\ 1/2 



£ |(x-w)*A^A,i/ 

9'#9 



2\ 1/2 



Inequality (J3TJ) implies that for x,y G T s , 
/ I 2 \ 1/2 

(E r A i ,Aq( ~ x ~ v ^ ) 

q'¥=q 
and, hence, 

d 2 (x,y) < 2y/sri\\x - y\\ 2 



I 2 \ 1/2 

2^ \(x-y)*A* ql A q y J < V*n||a: - y\\ 2 

q'^q 



(32) 
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Using the volumetric argument, see, for example, |44[ Proposition 10.1], we obtain 



n(t 3 ,\\ • \\ 2 ,u) < ( n2 s )(i + 2/ u y < (eny s y(i + 2/ u y. 



By a rescaling argument 

N(T s ,d 2 ,u) < N{T S ,2^\\ ■ \\ 2 ,u)=N(T 3 ,\\ • || 2 , u/(2,/m)) 
< (en 2 /s) s (l + 4Vinu- 1 ) s . 

Taking the logarithm completes the proof. 
3.3 Proof of Lemma [5] 

Now, we seek a suitable estimate of the covering numbers N(T s ,d±,u) for u > ^fn. Observe that by 
(|32|) the diameter of T a with respect to c?i is at most A^/sn. Hence, it suffices to consider N(T S , d%, u) 
for 

\fn < u < A^/sn, (33) 
as stated in the lemma. We use the empirical method [14], similarly as in [49]. We define the norm 



|, on C nx ™ by 



\x\\* = ^2\Re x\\ + \Im x\\ 



(34) 



For x G T s we define a random vector Z, which takes sgn(Re a;.\)eA with probability ^ttSt^j and 
the value i||£c||* sgn(ImxA)eA with probability j^y ■ 

Now, let Zi, . . . , Z m , Z[,..., Z' m be independent copies of Z. We set y = Y^j=i anc ^ v' = 
m Sjli Z'j an< ^ attempt to approximate B(x) by 



-. m 



(35) 



j,3'=l 



First, compute 



\B-B(x)\\ 2 F = E^2\x*Wg>, q x j £ ZjWfxZ'j, 



9,9 



E 



I— E zyw^zj, 



E ( - l**wv,,x| a + -j E *.W i w 9 j2,& i „yw;jZ i 



9:9' 



where we used that E[Z*W q . q ' Z'^,] = x*W qtq >x, j,f — 1, . . .to, by independence. Moreover, for j ^ j' 
and j' ^ j" , independence implies 



E[z*w q , q ,z>,(z>„ywi q ,z r 

To estimate summands with j' = j" , note that 



= \x*W q , q iX\ 



Z*W q ,, q Z' J ,{Z' r yW q , q ,Z r „ = \\x\\lZ*A* q ,A q P {x} A* q A q ,Z 



13 



where {A} = supp Zy is random. Hence, in this case, we compute using (|30p in Lemma [3] 



Y,^[z*A* ql A q Z' jl {Z' ji yA* q A q ,Z jl 



q'=£q 



< WxWl^l^A^AgP^A^Z,, 



q',q 



\\x\\Ie [z* £ [a;, ( £ a 9 p {a} a;) A q ) z 

q' q 

\\x\\lE[z*J2( A * q ' A i') z f"} =n\\x\\lE[Z*Z j 
X n\\x\\*E[zm[Z r »] 



if .7 =f 



= n\\x\\l\\x\\2 < n\\x\\ 2 , else. 



Symmetry implies an identical estimate for j = j'" , j' j". As x G T s is s-sparse we have \\x\\* < 
\/2||a;||i < \/2s||x||2 < \/2s. We conclude 



E E ^[z;Wq.q,zi,(z>,,ywi q ,z T 

q\q j,j'd",j"'=i 

< m 2 (m - if ^2 \x*W qA >x\ 2 + m 2 n4s 2 + 2m 2 (m - l)n • 2s. 



For m > and u < 4^/sn, we finally obtain, 

E||B - B{x)fp < ^-\x*W q ., q x\ 

q'.q 

m 2 n4s 2 Am 2 (m—l)ns 



m 2 (m 2 — 1) 



m 



Y,\**Wq,q.x\ 

q',q 



+ 



< 



A 1 4 

, nr mr 
4ns 2 4ns 4ns 2 



< 



4ns 



64ns 



44 



77i m 



Unsi ~ 121ns 121^ 



2 ^ 2 
7i < U 



(36) 



Since ||a;||* can take any value in [1, v2s], we still have to discrctize this factor in the definition of 
the random variable Z. To this end, set 



Next, we observe that, for A = (k,£) and A' = (k',£ r ), 



^ m 

— B(asgn(a;A W, asgn(xy )e v ' 

r^z z J 3 3 



B(ex',ex)q', q = {A q ,e x >)* A q e x = (Tv(\)e q ,Tv(\')e ql 

else, 



f ^ 
I 0, 



(37) 
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and, hence, \\B(ey , e\)\\ 2 F = n. Now, assume a is chosen such that |||aj||» — a 2 \ < Then 
\\B a - \\f 

^ m 

~~2 E B ( a sgn(a; A )e A .,asgn(a:y )e A < ) 



< 



^ 771 

2 E S(||a;||*sgn(xA J )e A , sgm>y. ,)e A < .,) 

I m 

Ml* -a 2 H|— E B(sgn(x A )e A sgn(x v Je fc /J|| F 
^ £ HB(e A , )ev )|U 

V 3,j'=l 



(38) 



We conclude that it sumccs to choose 



K := 



2s- 1 

U 
x/ri 



< \2i 



'■1 



values oik £ J s := [1, 2s], k = 1, . . . , K, such that for each f3 £ J s there exists fc satisfying |/3 — ctfcl < 

Now, given x we can find z\, . . . , z m , z[, . . . , z' m of the form ||a;||*p A e A , px £ {1,-1, such 
that ||B — B(x)\\f < u. Further, we can find k such that \\\x\\ 2 — a 2 \ < u/y/n. We replace the 
zi, ... , z m , z^, . . . , z' m by the respective ii, . . . , i ro , i 1; . . . , of the form a,p A e A . 

Then, using (|36j). (|38[) and the triangle inequality, we obtain 



Now, each £j, z'j can take at most \2sy/n/u] ■ 4 • n 2 values, so that 



can take at most (4[ 2s ^" ]n 2 ) 2m < (Csn^/u) 2 ™ values. Hence, we found a 2u-covcring of the set of 
matrices B(x) with x £ T s of cardinality at most {Csrfi /u) 2m . Unfortunately, the matrices of the 
covering are not necessarily of the form B(x). Nevertheless, we may replace each relevant matrix. 
(Clearly, if for a matrix ^ 53J*j'=i B {zj, z'-,) there is no such x, then we can discard that matrix.) 
^2 E™j'=i B (Zj, z'j>) b y a matrix B(x) with 



^ m 

1^)-^ E B(*i,^)|| J ,<2«. 



Again, the set of such chosen x has cardinality at most {Csrfi /u) 2m and, by the triangle inequality, for 
each x we can find x of the covering such that 



d 2 (x, x) < 4m. 



For m > llu 2 ns 2 , we consequently get 



log(iV(T s ,d 2 ,4u)) < log((Csn3 /u) 2m ) = 2mlog(Cns 5/2 /u). 
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The choice m = \llu 2 ns2~\ < 27u 2 ns^ and rescaling gives 

\og{N{T a ,d 2 ,u)) < 27u- 2 ns? log(4Cns 5/2 /u) < cu~ 2 ns^ \og(ns 5/2 /u). 
The proof of Lemma H] is completed. 

3.4 Proof of Lemma [6], Part I 

Now we show the estimate 

log(iV(T s ,di,u)) < slog(eri 2 /s) + slog(l + 4.SU" 1 ), 

which will establish one part of (f20|). Before doing so, we note that one can quickly obtain an estimate 
for N(T s ,d\,u) for small u using that the Frobenius norm dominates the operator norm, and, hence 
di(x,y) < d 2 (x,y) < 2y/sn\\x — y\\ 2 . In fact, this estimate would not deteriorate the estimate in 
Theorem [TJa). But in the proof of Theorem [ljb) , the more involved estimate d\(x,y) < 2s\\x — y\\2 
developed below is useful. 

Let us first rewrite d\. Recall (pZ5)) in LemmalU namely, A q e x = ir(X)e q , and, with A = (k,£) and 
A' = {k',£'), we obtain 

7r(A')*7r(A) = w fe '^">7r(A - A') = w(A, A')tt(A - A'). 
Writing now x = X^agZ xZ x x e x, the entries of the matrix B(x) in (p?T|) for q' ^ q are given by 

B(x) q t q = x\x\,e x > A*, A q ex = ^ xxxx'e q ,n(X / )*-K(\)e q 

A, A' A, A' 

= ^2 xxxx'uj{\ 7 A') e*,7r(A - X')e q = ^ x x x x >uj(\, A') e*,?r(A - \')e q 

A, A' A^A' 

= e *i' ( x x~xx>u{\ A') tt(A - A')) e q . 

A^A' 

We used for the fourth inequality that e*,Tr(£ , k )e q = if q' ^ q and k = 0. This shows that 

B(x) = xxxyuj(X, A') tt(A - A'). 

A^A' 

The estimate (fiMl) for the Schatten norms shows 

df{x,y) = || J2 - vxV*M\ A') ^r(A - A')||^ 2 

A^A' 

< WY.ixxxx-yxVxMW) 7r(A-A')||| p 

A^A' 

(xx^x^ - y\ 1 Vx' 1 )- ■ ■ ( x *2 P xx> 2p - yx 2p y X ' 2p ) x 

Ai^A^ ,A2^A 2 ,...,A2 P ^A 2p 

x w(A x , Ai) • ■ ■ uj(\ 2p , X 2p ) Tr(^7r(Ai - X[) ■ ■ ■tt(X 2p - A' 2p )) . 

Setting (^o, ko) = Ai — X[ + X 2 — A' 2 + • • • + A 2p — X' 2p we observe that the trace in the last expression 
sums over zero entries if ko ^ and sums over roots of unity to zero if £o ^ 0. We conclude that 



Tv(tt(X 1 -X' 1 )---7t(X 2p -X' 2p )) 
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Hence, 

di{x,y) 2p <n Y I^A^Ai - yxi^A'J Y \xx 2 xy 2 - yx 2 yy 2 \ ■ ■ ■ 

^ \ x *2 P -i x \' 2p _ 1 - y*2 P -iV\' 2p _ 1 I ^ [ a; A2p a: Ai-A^+-+A2p ~ yA 2p yA 1 -A' 1 +---+A 4n I- 
A2p-i#Aj s) _ 1 A 2p 

Now observe that, setting t = \± — X'^ + • • • + A 2p _i — ^2p-i) an d using the Cauchy-Schwarz inequality 

Y \ x >^t+x - y\y t +\\ < Y \ x *\\ x t+\ - y t +\\ + Y\ Xx ~ ^||z/A+t| 

A A A 

< NUHa: — 2/II2 -h || cc y|| 2 ||y|| 2 = (IMI2 + 1 1 X/ 1 1 2 ) 1 1 ^ - y|| 2 . 
We obtain similarly 

Y \ x *x\> - y\y x >\ = Y \ xx \ \ xy ~ y y \ + \y*'W x * - ^ (NU + \\vh)\\ x - vh- 

A, A' A, A' 

For x,y with suppcc = suppy = A for |A| < s and |a;|| 2 = |yj| 2 = 1 we have \\x\\i < ^/sll^lb = 
(and similarly for y) as well as ||x — 2/ 1 1 1 < V^ll^ — v\\2- Hence, 

(Hi + llvllOHsB-vlli < 2a||a;-v||2. 

This finally yields 

di(x,y) 2p < 2 2p ns 2p - 1 \\x-y\\l p 
for such x, y. As this holds for all p <E N we conclude that 

di(x,i/)<2s||x-i/|| a . (39) 

With the volumetric argument, see for example [44] Proposition 10.1], we obtain the bound 

log(AT(T s , || • || 2 , u)) < a log(en 2 /s) + s log(l + 2/u). 

Rescaling yields 

log(N(T s ,d 1} u)) <log(N(T s ,2s\\ ■ II2, u)) = log(iV(T s , || • || 2 ,u/(2*))) 
< s \og(en 2 /s) + s log(l + 4sit _1 ), 

which is the claimed inequality. 

3.5 Proof of Lemma [6], Part II 

Next we establish the remaining estimate of ([20l , 

log(JV(r aj di,«)) < cu- 2 s 2 log(2n)log(n 2 /w). 



To this end, we use again the empirical method as in Section [ 

For x € T s , we define Z\, . . . , Z m and Z[, . . . , Z' m as in Section r3.31 that is, each takes independently 
the value ||£c||* sgn(Rea;A)eA with probability ^-pr^, and the value z||a;||* sgn(Im;rA)e,\ with probability 

I Im x x I 

As before, we set 

B(Z,Z') = (Z*W q > q Z') q ,, g , (40) 
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where A*,A q = A*,A q for q' ^ q and W 9 . g = 0, j = 1, . . . , N, and attempt to approximate B{x) with 



(41) 



3=1 



That is, we will estimate E||_B — B(a;)||2__j.2- 

We will use symmetrization as formulated in the following lemma [HI Lemma 6.7], see also {3"51 
Lemma 6.3], [39j Lemma 1.2.6]. Note that we will use this result with Bj = B(Zj , Z'j). 

Lemma 9 (Symmetrization) Assume that (Yj)™ =1 is a sequence of independent random vectors in C r 
equipped with a (semi-)norm \\ ■ \\, having expectations f3j = EY^-. Then for 1 < p < oo 



E||^(y j -/3,)|| p ) 1/P <2(E||^ £j ^r 



l/v 



(42) 



3=1 



where (ej)j=i is a Rademacher series independent of (Yj)J^- 1 . 

To estimate the 2p-th moment of ||B(a;) — 2— v2 ? wc will use the noncommutative Khintchine inequality 
[3 [44] which makes use of the Schatten p- norms introduced in (|24|) . 

theorem 10 (Noncommutative Khintchine inequality) Let e = (ei, . . . , e m ) be a Rademacher sequence, 
and let Aj, j = 1, . . . , m, be complex matrices of the same dimension. Choose p£N. Then 



E llE^lll p <gr ! nrax{|(^^ 



3=1 



2Pp 



1/2 



3=1 



2;- 

5 , 2 P 



3=1 



(43) 



Let p € N. Wc apply symmetrization with _£?., = B(Zj , Zj), estimate the operator norm by the 
Schatten- 2p- norm and apply the noncommutative Khintchine inequality (after using Fubini's theorem), 
to obtain 



E||B - B(x)\\^ 2 ) 2P = (E||- £(B(Z„ Z<) - EB(Z„ Zj))|' 2p 



2->2 



3 = 1 



2/ \ — 2 / 

^ - ( E H E ^B(Z 3 ,Z' 3 )\\^ 2 ) 2P <- (E|| £ ^(Z* * s ) 



|2p 



3=1 



3=1 



^(Sr)*( E —{||(i:^.^)*^.^)) I/2 

^' 3=1 



2p 

) 

52p 



3=1 



2p 

S2p ■ 



Now recall that the Zj,Z'^ may take the values || aj|| *paCa, 
P\ 6 {1, —1, i, — i}. Further, observe that B(e X i , e>)* = B(e\, e x >), and, for 5 7^ q', 

{B{e x , ,e x )*B(e x , ,e x )) qA „ = ^e* x A* q A q ,e x , e* x , A* q , A q „ e x 



(44) 
with 



= J2 e * A * A «' Px ' A *' A i" ex = e \ A * q (J2 A i' Px,A *') A i ,,e> 

q' q' 

= e \ A q A q" e \ = (7r(A)e g »,7r(A)eg) = (e q »,e q ) = S(q" - q). 
Therefore, B(e x >, e x )*B(ey, e x ) = I and 



B{Z i ,Z' i yB{Z ,Z' 3 ) = \\x\\U. 



(45) 
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Since H-THg^ = n, \\x\\* < 2s\\x\\ 2 — 2s, we obtain 

m 1 /2 m 1 /2 

w{Y. B ^ zl i)* B ^ z ' 1 )) ii1 p = ii(eni* j ) ~ri 2v = \\A\fm p n 

3 = 1 J'=l 

< (2s) 2p m p n . (46) 
By symmetry this inequality applies also to the second term in the maximum in (|44j) . This yields 



m V 2iq\ ) ~ y/m V 2Pp\ 

Using Holder's inequality, we can interpolate between 2p and 2p + 2, and an application of Stirling's 
formula yields for arbitrary moments p > 2, see also [33], 

(E||B - B{ X )\\U 2 ) 1/P < 2*'^nVPe-V^^=. (47) 

Now we use the following lemma relating moments and tails [323 [33] ■ 

Proposition 11 Suppose S is a random variable satisfying 

(E\E\ p ) 1 / p < a^ lv p xh for all p>p 
for some constants a,/3,7,po > 0. Then 

P(|H| > e^av) < pe-' v ~'t~< 

for all v > Pq 7 . 

Applying the lemma with po = 2, 7 = 2. /3 = 2 3 / 4 77, a = e _1 / 2 -^=. and 

w = u = it 5-75- — = u— — > V2 

a e~ 1 ' 2 4s 4s 

gives 

P(||B - B(as)|| 2 -> 2 > u) < 2 3/4 ne"S77 i u > 4s\/27m. 

In particular, if 

QO„2 

m > — ^ log(2 3/4 n) (48) 

then there exists a matrix of the form ^Y^jLi B( z j> z j) with of the given form || a; || *paGa for 

some k such that 



m 



< it. 



As before, we still have to discretize the prefactor ||x||*. Assume that a is chosen such that |||x|| 2 — a 2 \ < 
u. Then, similarly as in 



m 

i=i 



— ^2 B ( a sgn(xA 3 )e\ j , a sgn(xA 3 , )e\., ) 
=1 

m 

- B (W x h s S n ( x A 3 )e Aj , IMIi sgn(x Xj , )e Xj , ) 
I m 

- -B(sgn(x Aj )e Aj , sgn(x Ay )e A ., )|| 2 -y2 



m 



Ml?-a 2 l 



m 



< — E ll S ( s S n ( a; A J )e A3 ,sgn(2; A3 ,)e Aj ,)|| 2 ^2 = u. 

771 
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Hereby, we used ||B(sgn(.T Aj )e X] , sgn(x Xj , )e x ., )|| 2 ->-2 = 1. 

As in Section [3731 we use a discretization of J s = [1, 2s] with about K = [^] elements, a.\, . . . , ctx 
such that for any j3 in J s there exists k such |/3 — a|| < u. Now, provided (|4"5|) holds, for given x we 
can find Z\,..., z m , z[, . . . , of the form sgn(xA)eA, p{\) G {1, —1, i, —i}, with 

l|s(*)--^B(i^)||2^ 2 <2 U . 

3=1 

Observe as in Section [3731 that each £j can take 4|~ 2 ^]n 2 values, so that ^ Y^jLi B(zj,z'j) can take at 
most (4|~ 2 ^]fi 2 ) 2m < (Cn 2 s/u) 2m values. As seen before, this establishes a 4m covering of the set of 
matrices B(x) with x <E T s of cardinality at most (Cn 2 s/u) 2m , and we conclude 

log(JV(T a ,di,tt)) < \og((Cn 2 s/u) 2m ) < C'4log(2 3/4 n)log(Cn 2 S / M ) 

< C^\og(2n)\og(n 2 /u). 

U 

This completes the proof of Lemma |5J 



4 Probability estimate 

To prove Theorem [TJb) will use the following concentration inequality, which is a slight variant of 
Theorem 17 in [6], which in turn is an improved version of a striking result due to Talagrand |52j . Note 
that with B(x) as defined above, Y below satisfies EY = nES s . 

theorem 12 Let 38 = {B{x)\ x ^t be a countable collection of n x n complex Hermitian matrices, and 
let e = (ex, . . . ,e n ) T be a sequence of i.i.d. Rademacher or Steinhaus random variables. Assume that 
B(x) q q = for all x 6 T . Let Y be the random variable 



Define U and V to be 



and 



Y = sup 



€*B(x)e = | ^ e q'tq B { X )q' ,q 
q,q'=l 



U = sup \\B(x)\\ 2^2 



V = Esup\\B(x)e\\l = E sup ^ V^e q B(x) 



*£V=x -,=i 



q' ,q 



(49) 



Then, for A > 0, 



"(Y > E[Y] + a) < cxp ( - 



A 2 



32V + 65UX/3 



(50) 



Proof: For Rademacher variables, the statement is exactly Theorem 17 in [6]. For Steinhaus sequences, 
we provide a variation of its proof. For e = (ex, ... , e„), let <?Af(e) = k=i £j e kMj t k and set 



Y = f{e)= sup 



Further, for an independent copy q of e£, set = (ex, . . . , ei , eg, e^+i, . . . , e„) and F^* 1 = f(e^). 
Conditional on (ex, ... , e n ), let AT = Af (e) be the matrix giving the maximum in the definition of Y. 
(If the supremum is not attained, then one has to consider finite subsets Tcf. The derived estimate 



20 



will not depend on T, so that one can afterwards pass over to the possibly infinite, but countable, set 
Then we obtain, using M* = M and M^k — in the last step, 

E[(Y - Y^fl z>zw \e] < E[\g$(e) 9 U (e^)\ 2 l z>zW \ 

n n 

= E\(ei-ei) ^2 £j M j,e + (ei-ee) ^ I k M e,k\ 2l Z>zw\ e 
j=l,j& k=l,kjU 

n 2 n ° 

<4E el |e,-5| 2 | J2 At =&\J2^M: 

The remainder of the proof is analogous to the one in [5] and therefore omitted. M 

We first note that we may pass from T s to a dense countable subset T° without changing the 
supremum, hence Theorem 1121 is applicable. Now, it remains to estimate U and V. To this end, note 
that (|39| implies 

U = sup \\B(x)\\ 2 ^2 < sup 2s||a;|| 2 = 2s . 

x£T s x£T s 

The remainder of this section develops an estimate of the quantity V in (|49l) . Hereby, wc rely on a 
Dudley type inequality for Radcmachcr or Steinhaus processes with values in £ 2 , see below. First we 
note the following Hocffding type inequality. 



Proposition 13 Let e = (e g )™ =1 be a Steinhaus sequence and let B G 

P(||-Be|| 2 > u\\B\\f} < 8e-" 2/16 . 

Proof: In [46l Proposition B.l], it is shown that 

p(||Bc|| 2 >u\\B\\ F ) <2e- u2 ' 2 . 



Then, for u > 0, 



(51) 



(52) 



for Radcmachcr sequences. We extend this result using the contraction principle |33[ Theorem 4.4], as 
in the proof of Theorem |3l 

In fact, [33l Theorem 4.4] implies that for B € C™ x " and e being a Steinhaus sequence and £ a 
Rademacher sequence, we have, for example 



Hence, 



Re(B)Re(e)|| 2 > u\\B\\ F ) < 2P(|| ReB£|| 2 > u\\B\\ F ) < 4e~ u2/2 . 



\Be\\ 2 > u\\B\\ F ) = P(|| Re(Be)\\l + \\ lm(Be)\\ 2 > u 2 \\B\\ 2 F ) 
,,2 



\lm{Be)\\ 2 >-j=\\B\\ 2 F ) 



<P(||Re(Be)||^> -/=) 4 

<P(||RcBRce)|| 2 > ^||B||^)+P(||ImBIm e )|| 2 > -^=||B|||.) 



V8' 



ReBIme)|| 2 >^=||B|| 2 F )+P(||ImBRe £ )|| 2 >-^||B|||,) 



< 8e-" 2 / 16 . 



With more effort, one may also derive (|51j) with better constants. Let us now estimate the quantity 



V 



sup \\B{x)e\\ 2 =E sup ^ | ^ e q B(x) q 



x£T. 



x£T 3 



',q\ 



<jr'=l q=l 
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It follows immediately from Proposition Q2] and ([52")) that the increments of the process satisfy 

P(||B(aj)e - B{x')e\\ 2 > u\\B{x) - B{x')\\ F ) < 8 e - u2/16 . (53) 

This allows to apply the following variant of Dudley's inequality for vector- valued processes in l 2 . 

theorem 14 Let R x , x £ T, be a process with values in C rn indexed by a metric space (T,d), with 
increments that satisfy the subgaussian tail estimate 

¥{\\R X - R x ,\\ 2 > ud(x,x')) < 8e-" 2 / 16 . 

Then, for an arbitrary Xq £ T and a universal constant K > 0, 

/ N 1/2 r°° 

(Esupll^-i^jH) <K v/log(7V(T, d,u))du, (54) 
v xeT / Jo 

where N(T,d,u) denote the covering numbers ofT with respect to d and radius u > 0. 

Proof: The proof follows literally the lines of the standard proof of Dudley's inequalities for scalar- 
valued subgaussian processes, see for instance [44j Theorem 6.23] or [21|33j[53]. One only has to replace 
the triangle inequality for the absolute value by the one for || • ||2 in C m . ■ 

We have d = d 2 defined above, and, hence, (f^T|) provides us with the right hand side of (|54"1) . Using 
the fact that here, R x = B(x)e, we conclude that 

V = E sup ||B(a;)e||l = E sup \\B(x)e - B(0)e\\l 

xET, x6T 3 

< (KCVns 3 / 2 y/log(n) log(s)) 2 < C'ns 3/2 log(n) log 2 (s). 

Plugging these estimates into ([50)) and simplifying leads to our result, compare with [46]. In partic- 
ular, Theorem [TJb) follows. 
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